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Abstract 

The purpose of this article is to propose ODE based approaches for the numerical evalu¬ 
ation of matrix functions /(A), a question of major interest in the numerical linear algebra. 
To this end, we model /(A) as the solution at a finite time T of a time dependent equa¬ 
tion. We use parallel algorithms, such as the parareal method, on the time interval [0, T\ 
in order to solve the evolution equation obtained. When /(A) is reached as a stable steady 
state, it can be computed by combining parareal algorithms and optimal control techniques. 
Numerical illustrations are given. 
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1 Introduction 

The efficient numerical computation of matrix functions, as well as the solution of matrix poly¬ 
nomial equations (such as Riccati’s), is one of the actual important topic in numerical linear 
algebra: this kind of problem arises in a number of situations such as for the approximation of 
nonlocal operators or of infinitesimal generators as well as in computational control, we refer 
to Em for the computation of pth-roots of matrices, to [25j for the solution of rational matrix 
equations and to the book of Nick Higharn [f% j for a general presentation. 

Let A be a matrix, to evaluate /(A) or f(A)b for a given vector 6, number of strategies have 
been developped: they often use an application of approximation theory techniques to a com¬ 
plex representation of /, defining rational approximation methods, let us cite the recent works 
of Fommer et al da on rational Krylov methods and the references therein. 
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The use of efficients parallel computations of parabolic PDEs have been proposed in e.g. in 
m where the generator exp(tA) is decomposed as a sum of independent polar terms, and more 
recently, by Gander and Guettel [12] with the Paraexp algorithm combining Duhamel’s principle 
and additive identities. The evaluation of exp(A) is an old problem, one can find a nice survey 
of methods in (24] . However, in number of situations, the function / is not known so the tools 
of approximation theory can not be applied. 


In a more general way, differential equations can be used to identify, then to compute, the 
matrix function f(A), as a state of the solution; it can be a solution at finite time as well as a 
stable steady state, when good stability properties are present. 


To compute f(A ) as the solution of an ODE at finite time T, T = 1 for simplicity, an 
homotopy method can be displayed as follows: let us consider the matrix differential equation 

HX 

w = Wt», t e( 0 . 1 ). (1) 

W(0) = x„. 

We look for T such that to have X(l) = f(A). We introduce the homotopy path A(t) = 
Xq + t(A — Xo), we set X(t) = /(4(f)); Xq is chosen in such a way f(X o) is easy to compute. 
We have: 

(2) 

X( 0 ) = x 0 . 

A numerical approximation of f(A) = X(l) can be obtained by using any time marching scheme. 
For example, taking Xq = /, where I is the identity matrix and J~{X) = X(Id — A)X we have 
X(l) = A~ 1 , so Forward Euler method as well as Runge Kutta method can be applied to build 
inverse preconditioners of A, see [3]. 


However, in a number of situations, it can be difficult to model f(A) as the solution of the 
ODE at finite time and it is interesting to link f(A) to a stable steady state, see [3, B|. Defining 
X = f(A) as an asymptotically stable root of F, say F(X) = 0 X = f(A) and assuming 

that the eigenvalues of the differential of T at f(A), DF(A), are of strictly negative real part, 
we can consider the differential equation 

(3) 

x(0 ) = A-,,. 

This modeling allows the computation of sparse approximations to f(A) by defining an as¬ 
sociate sparse matrix ffow, see [3]. 

Hence, if a differential system possesses F(A) as an asymptotical steady state, it is possible 
to compute numerically F(A) by applying an explicit time marching scheme. The efficiency 
of the method depends both on the dynamical properties of the differential system and on the 
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numerical scheme in time. 


Parallel methods in time have been introduced and developed as a mean to solve time de¬ 
pendent problems using parallel computing, they apply to the computation of the solution of 
an ODE at a given finite time. These methods where especially devoted to the computation 
of f(A)b, where b is a given vector in W 1 Let us cite the recent Paraexp and Rational Krylov 
methods uaina, to name but a few. However, it must be noticed that these methods are direct 
but not well adapted to the computation of f(A), furthermore they assume to have a knowledge 
of some rational expansions of f(z), z E C which is not always the case. The Parareal Method 
(PM) is iterative and it can be seen as a multi-steps shooting scheme, where each step can 
be solved in parallel. Firstly designed for parabolic-like problems, PM have been successfully 
adapted to second order evolutive PDE, see mmm but few attention was given for their 
application in numerical linear algebra and the main purpose of this article is to show that 
interesting issues can be derived for evaluating f(A). 

In this article, we consider a framework in which the parallel computation of functions of 
matrices can be applied, we focus on PM for which we propose adaptations and implementa¬ 
tions for computing numerical approximations of matrix functions. The article is organized as 
follows: in Section [2] we recall some parallel in time methods including the classical parareal 
algorithm which we apply to solve equation Q in order to compute the matrix function f(A) 
of a matrix A. We numerically illustrate the algorithm by approximating the inverse and the 
exponential of a matrix. In Section [3j we propose a version of the modified parareal algorithm 
seen as a multiple shooting method as described in [ 13] , adapted here to the matricial compu¬ 
tations. Using this algorithm, we compute the cosine of a matrix and we compare the results 
obtained with this algorithm to the case when the cosine is obtained using the classical parareal 
algorithm. We can notice that the modified parareal algorithm produces an acceleration for 
the convergence. In Section [4] we consider the case when the matrix function is found as the 
stable steady state of an ordinary differential equation. We propose some methods allowing to 
accelerate the convergence in time, in order to efficiently compute the steady state. Section [5] 
presents a particular acceleration procedure in order to converge to the steady state. The idea 
is to obtain the matrix function as a solution of an optimal control problem and to apply a 
parareal in time method in order to solve the control problem, as proposed by [22]. 

All the numerical computations presented here have been made with Matlab. 


2 Parallel algorithms applied to the computation of matrix func¬ 
tions 

Before focusing on the application of the parareal method to the numerical evaluation of func¬ 
tions of matrices, we recall briefly some special parallel algorithms in time for solving numerically 
the equation: 


and where one needs to be able to evaluate exp(tA)6 for a certain given vector b. 

A first idea in solving Q consists in identifying a generator and to use a proper expansion 
of the underlying function. Namely, following DU we need to approximate the exact solution of 
Q which reads as: 

u(t) = uq + A _1 (exp(fA) — Id)(Auo + g). (5) 

Introducing < j)(t) = e ~ , we have u(t) = uq + tcj)(tA)(Auo + g). At this point, we consider the 
polar decomposition of cj)(s) on the spectral interval of tA as 

(j){z) ~ r(z) = Y^ ^ (6) 

1=1 s i z 


Hence, the vector u at time t is computed as: 

v 

u(t) ~ uo + ^ ~^tujj(sjld — tA)^ 1 (Auo + g ). (7) 

j=i 


This last sum can be built in parallel on p processor by solving p independent linear systems 

p 

(Sjld — tA)vi = Auq + g, and taking then the linear weighted combination E UJjVj. 

3 = 1 

Another approach to solve Q was proposed by Gander and Guettel: the paraexp method. 
It consists in applying the Duhamel principle writing u = v + w where 


dv 

Tt =Au ’ 
u{ 0) = u 0 , 


( 8 ) 


and 


dw 

m =Aw + 9 ' 

w{ 0) = 0. 


(9) 


Then, decomposing the time interval as [0, T] = U^ =1 [7}_i,7}] with 0 = To < T± < T 2 < ■ ■ ■ < 
T p = T, Gander and Guettel define the sequences Vj and Wj as 

Step 1 For j = 1, • • -p solve numerically with serial integrator the Cauchy problem 

= Avj(t) + g(t),t £ [Tj-i.Tjl.VjfTj-i) = 0 


Step 2 For j = 1, ■ ■ -p solve numerically with an acurate (nearly exact) propagator the Cauchy 
problem 

w'j{t) = Awj(t ), <E [Tj_ ll T\,w j (T j _i) = 


4 




Finally the solution is obtained as u(t) 
details. 


k 

Vk(t ) + for t <E [T fc _i,Tfc], see QJJ for more 

i=i 


This last method outperforms the parareal method for solving the original Cauchy problem. 
Anyway, as stated in the introduction, it can not be applied in all situations, this is why we first 
recall the principle of the PM and apply it for computing the matrix f(A), we do not consider 
here the problem of evaluating f(A)b. 

We now concentrate on PM. First introduced for the parallel numerical solution of parabolic 
problems, we show how to adapt them to the computation of function of matrices. We start 
by describing the parareal algorithm that we intend to use. The method was first introduced 
by Lions, Maday and Turinici in [21] and the main goal of this method is to propose a time 
discretization of a differential evolution equation that allows for parallel implementation, in order 
to accelerate the computation of the solution, compared to the use of a sequential method. In 
what follows we introduce the parareal algorithm in the form of a multiple shooting method, as 
it was described by Gander and Vandewalle in m- The purpose of the method is to compute 
in parallel a numerical solution for ordinary differential equations of the form: 

u'(t) = t e[0,T], 

u( 0) = u 0 . 

The parareal method uses two propagators: an inexpensive coarse propagator G(T n +i, T n . x) 
which gives a rough approximation to u(T n ), where u is the solution of equation ( |10| ) having 
u(T n _ i) = x as initial condition, and an expensive fine propagator F(T n+ i, T n , x) which gives a 
more accurate approximation to the same solution u{T n ). Partitioning the time domain (0,T) 
into IV-time subdomains Q n = (T n ,T n+ 1 ), the algorithm works as follows: 


Algorithm 1 The classical parareal algorithm 
1: Step 0: The algorithm starts with an initial approximation U®, n = 0,..., IV, 
which can be found for example by using the coarse propagator: 

[/° +1 = G(T n+1 ,T n ,C/0), Uq = uq. (11) 


2: Step k+ 1: 

• Compute in parallel F {T n+ \ , T n ,U%), for n = 0,..., N — 1. 

• Perform a correction step, using both the coarse and the fine propaga¬ 
tors: 

= G(T n+ i, T n , U* +1 ) + F (T n+1 ,T n , UZ) - G(T n+1 ,T n , U%). (12) 


Remark 2.1 We note that the initial step © of the algorithm is sequential but not expen¬ 
sive, since it uses only a coarse propagator. At Step k + 1, correction (12) provides a more 
accurate solution by using the fine propagator. The significant advantage of this method, com¬ 
pared to a sequential method using the fine propagator, is that all the expensive computations 
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F(T n+ i. T n , Uf) are computed in parallel and thus correction (12) provides a solution as accurate 


as the one obtained by the sequential use of the fine propagator F but at a cost comparable to a 
sequential computation using the coarse propagator G. 

In order to be able to apply this method to the computation of a matrix function f(A), with 
A a matrix, we need to be able to see f(A) as the solution at a certain time T of an ordinary 
differential equation. Thus, we introduce the following matrix function: 

A{t) = X 0 + t(A-X 0 ), (13) 

with Xo which can be chosen later and we define Xft ) = f(A(t)). The matrix function X sat¬ 
isfies the following ordinary differential equation ([2]). We can note that f(A) = X(l) and thus 
the numerical computation of f(A) reduces to finding a good approximation for the solution of 
|2]) at time T = 1. 

Example 1: The inverse A of a matrix A 

Let A be a matrix belonging to A4 n (R) and I the identity matrix in _M n (R). We define 
A(t.) = I + t(A — I) which corresponds to taking Xq = I in (13). We assume that Aft) is 
an invertible matrix for all t in [0,1]. Then, the matrix Qft) = Ait) -1 satisfies the following 
ordinary differential equation: 


^ = -Q(t)(A - I)Q(t), 

Q«>> = I, 


tel , 


(14) 


and A~ l = Q{ 1). Our purpose now is to find an approximate solution for the solution Q of (15) 
at time T = 1. 

The idea of finding the inverse of a matrix as the solution at a certain time T of an ordinary 


differential equation was developped by Chehab in |3j. Here the difference is that we solve (15) 
by the parareal algorithm described above. 

In Figure [l] we numerically illustrate the computation of the inverse of the matrix, using the 
parareal method. We compare the result obtained by applying to (15) the classical parareal algo¬ 


rithm to the inverse of the matrix and respectively to the solution obtained by using sequentially 
a fine propagator. 

For the numerical simulations, we first divided the time interval [0,1] into IV = 25 inter¬ 
vals [T n , T n+ 1 ], obtaining thus a coarse grid. Each interval was also divided into J = 200 
subintervals, obtaining a fine grid. For the coarse and fine propagators, we used an Euler 
explicit method. The norm we considered in order to estimate the error at each iteration 
is norrn(-) = max.y(abs(-))/maxjj(abs(H)), where A e A4 n (R) is the matrix we invert and 
i, j = 1,..., n. The matrix A that we consider in this example is the ID Laplacian belonging to 
A4so(R)- We actually compute the inverse of the matrix A/2 10 -, we recover H” 1 by multiplying 
the result by 2 10 . 

Example 2: The exponential exp(.4) of a matrix A 

Let A be a matrix belonging to _M n (R) and / the identity matrix in A4 n (R). We define Aft) = 
tA, then the matrix Qft) = exp(A(t)) satisfies the following ordinary differential equation: 


f = *»<*>• 

Q(o) = /, 


tel, 


(15) 
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Figure 1: Error between the solution given by the parareal algorithm and the inverse of 
the matrix (left); Error between the solution given by the parareal algorithm and the solution 
obtained using sequentially a fine propagator (right) 


and exp(A) = Q( 1). As previously, we can find an approximate solution for Q of (15) at time 
T = 1, by using the parareal algorithm. 

In Figure [ 2 J we illustrate the computation of exp (B), where B = —A/2 10 and A is the 
Laplacian matrix that we used in the previous example on the inversion of a matrix. We 
consider the same coarse and fine grids as in the previous example and the propagators are 
obtained using a Crank-Nicolson method. The norm we consider in order to estimate the error 
at each iteration is norrn(-) = maxjj(abs(-))/maxjj(abs(exp(i?))), where B E ,M ri (R) is the 
matrix for which we compute the exponential and i,j = 1 ,,n. We refer the interested reader 
to [25J for a review of methods to compute the exponential of a matrix. 




Figure 2: Error between the solution given by the parareal algorithm and the exponential of 
the matrix (left); Error between the solution given by the parareal algorithm and the solution 
obtained using sequentially a fine propagator (right) 
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3 The modified parareal algorithm adapted to the matriceal 
computations 


In what follows we adapt the modified parareal algorithm (see m,m for the modified parareal 
algorithm, see also 0,121) to the matriceal case, in order to be able to use it for the computation 
of matrix functions. 

We start by introducing the Kronecker product and the o product. 

For two matrices A and B belonging to .M n>s (R), we define the inner product < A, B >f= 
tr{B T A), where tr(M ) is the trace of a matrix M. The associated norm is the Frobenius norm 
denoted by || ■ \\p. 

Definition 3.1 A system of matrices of A1„ !S (R) is said to be F-orthonormal if it is orthonor¬ 
mal with respect to the Frobenius inner product < •, • >f- 


We also introduce the o product, defined by: 


Definition 3.2 Let A = [A\. A 2 ,..., A p ] and B = [B±, B 2 ,.. ., Bj\ be matrices belonging re¬ 
spectively to M n .p S (R) and A i n ,i s (R), where A i, Bj belong to Ai njS (R ) for all i = 1,... ,p and 
j = 1 ,,1. Then, the p x l matrix A T o B is defined by: 


A T oB 


(< Ai, B\ >p < A\, B'2 >f 
< A2 , Bi >f < A2, B2 >F 


< A\,Bi >i?^ 

< A 2 ,Bi >f 


\< A p , Bi >f < A p , B 2 >f 


< A p , Bi >f) 


We recall here the following properties for the o product; for more details we refer the reader 
to [2]. 


Proposition 3.3 1. If s = 1, then A T o B = A T B. 

2. The matrix A = \Ai, A 2 ,..., A p ] is F-orthonormal if and only if A T o A = I p . 

3. IfX&M n , s (R), then X T oX = \\X\\ 2 f . 


In what follows, we also need to introduce the global QR-factorization algorithm (see [2] for 
more details on the method): 


The global Q-R-factorization algorithm globalQR(-) 

Let Z = [Z\, Z 2 , • • •, Zf,] be a matrix of k blocks, where Z* E Al niS (R), with i = 1,..., k. We 
search for an R-orthonormal family of matrices {Qi, Q 2 , ■ ■ ■, Qi} such that Span{Q\, Q 2 , ■ ■ ■, Qi} = 
Span{Z\, Z 2 , ..., Zk} and < Qi, Q 1 >f= 1 for all i, with < Qi, Qj >f= 0 for all i, j such that 
i / 3■ 




Algorithm 2 The globalQR algorithm 
1: Define 

• R — (j'i,j')i,j=l,...,k — Ok 

• rn = \\Zi\\ F 

• Qi = Zi/m 

2: for i = 2,..., k do 

• Q = Zi 

• for j = 1 ,,i — 1 do 

• j.i —^ Q, Qj f , Q — Q 

• end for j 

• m = \\Q\\ f 

• If m / 0, then Qi = Q/m 

Else Qi = Q 

3: end for i 

4: If m = 0, eliminate Qi from the family {Qi, ..., Qk} and eliminate 
the i-th line and column from R 


The result of the algorithm is [Q,R] = globalQR(Z). 

Proposition 3.4 1. The algorithm (1) — (4) provides the matrix Q = [Q \,... ,Qk\ such that 

Z = [Z 1? ..., Z k \ = Q(R <S> I s ), 

where 


Q t o Q = 


the coefficient 0 on the diagonal of Q T o Q corresponding to the i-th position on which 
ru = 0. 

2. The algorithm (1) — (5) provides a basis {Qi,..., Qi} in Span{Z \,..., ZQ- 

Using the global QR-factorization algorithm introduced above, we are now able to construct 
the projection on the space generated by a family of matrices. 


The projection on a space generated by a family of matrices { Z \,..., Z *.} 

Let {Z i,..., Zfc} be a family of matrices. By applying the global QR-algorithm described 
above to the family {Zi,..., ZQ, we obtain {Qi,..., Qi} to be a family of F-orthonormal 
generators of Span{Z\ ,..., Zjff. 
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We define Q = \Q \,..., Qi]. Then, Q T o Q gives a vector (ai,..., cq) which defines the 
projection PqY of Y on the matrix space Span{Z\, .. . , Zf.} = Span{Q i,. .. , Qi}: 

i 

PqY = YaiQu 

i=l 


with (ai,..., ai) T = Q T o Y. 

The modified parareal algorithm 

Taking into account these tools, we are now able to adapt the modified parareal algorithm 
described in m to the matrix case, in order to be able to solve in parallel the following evolution 
equation: 


U'(t) = f(U(t)), 
U{ 0) = Uo, 


t G [0, T], 


(16) 


with U : R -)• At„, p (R), U 0 G 7W niP (R) and / : A^ niP (R) -)• jVf n , p (R). 


We will start by studying the linear homogeneous case U'{t) = BU(t), with B G A / f n ,n(R) 


The linear homogeneous case 

The parareal algorithm that we present here is a multiple shooting method using two propa¬ 
gation operators. As for the classical parareal algorithm, we start by splitting the time interval 
[0,T] into IV-time subdomains = (T n _i,T n ), with n G {1,...,JV} and T n — T n _\ = A T n . 
As previously, let G(T n , T n _i, x) be a coarse propagator which gives a rough approximation to 


u(T n ), where u is the solution of equation (16) having it(T n _i) = x as initial condition and let 


F(T n ,T n _i,x) be a more expensive fine propagator which gives a more accurate approximation 
to the same u(T n ). The coarse propagator can be any sequential method while the fine propa¬ 
gator can be obtained by applying a sequential numerical method on the time interval Q n which 
was partitioned into J subintervals of size A t n (obtaining thus a fine grid for the whole time 
interval [0, T]). The algorithm works as follows: 
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Algorithm 3 The modified parareal algorithm for linear homogeneous problems 
1: Step 0: The algorithm starts by constructing an initial approximation U®,n € 
which can be obtained for example by using the coarse propagator sequentially: 

<+1 = G(T n+ i,T n , U%), Uq = U 0 . (17) 

We also define the space as the empty space. 

2: Step k + 1: 

• We enhance the subspace S k = Span{S k ~ 1 U {U k , n = 0with {U k } n 
known from the previous step. 

• We find a basis in S k using the global QR-algorithm. 

In order to do so, we construct 

Z k = [S k ~\U k ,...,U k N \, 

where is the matrix having as columns the basis of 5 fc_1 . Applying the 

global QR-factorization, we find the matrices Q k and R k such that [Q k ,R k ] = 
globalQR(Z k ). 

Remark 3.5 We note here that Q k = [Q k ~ 1 ,Q k 1 , ■ ■ ■, Q k l \ since the basis in S k is 
obtained by completion of the basis in S k ^ x . 

• We compute in parallel the evolution of {Q k }j t through the fine propagator: 

F(T n+l ,T n ,Q k .), ji € {ji ,..., ji}, n£{0,...,JV-l}. 

Remark 3.6 At this step we only need to compute the evolution of Q k , with ji € 
{j i,..., ji} since the evolution of all the other elements of the basis is known from 
previous steps. 

• We can now perform the sequential update: 

U k X\ = F(T n+1 ,T n , P k U k+l ) + G(T n+ i,T n , (/ - P k )U k+1 ), (18) 

with P k the projection on the space S k . 

In order to compute F(T n+ i,T n , P k U k+1 ), we first compute 

[ui\ 

(Q k ) T oU k+1 = : , 

\ a J 

and thus P k U k+1 = J2i a iQi- Then, the evolution of P k U k+1 through the fine 
propagator can be found as: 

F{T n+1 ,T n ,P k U k+1 ) =J2»iF(T n+l ,T n ,P k Q k ). 
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Remark 3.7 1. The only sequential computation in (18) is G(T n+ i,T n , (I — P k )U k+1 ) which 

is a cheap computation since it uses only a coarse propagator. No extra evaluation using 
the fine propagator is required in (18) since the fine evolution on the space S k is already 


known and has been computed in parallel. 


2. Note that if we formally take k —> oo in (18), then the projection P k tends to the identity 
matrix and thus U k+1 tends to the approximate solution given by the fine propagator. A 
rigorous explanation for this phenomenon can be found in m 


The linear inhomoneneous case 

The algorithm for the inhomogeneous case follows the same idea as in [13], [14] . Denoting by 
F the matrix that defines the fine propagator, we can notice that F(T n+ i,T n , aX + /3Y) can be 
computed as follows: 

F (T n+1 ,T n , aX + f3Y) = aF J X + (3F J Y + F (T n+1 ,T n , 0) 

= a(F(T n+1 ,T n ,X) - F(T n+1 ,T n ,0)) +/3(F(T n+1 ,T n ,Y) - F(T n+1 , T n , 0)) (19) 

+ F(T n+ i, T n , 0). 

Thus, the affine computation for F(T n+ i,T n , P k U k+1 ), and similarly for G(T n+ i, T n . ( I—P k )U k+l ), 
requires only an additional computation, which leads to the following algorithm: 


Algorithm 4 The modified parareal algorithm for linear inhomogeneous problems 
1: Step —1: We start the algorithm by computing the quantities F(T n+ i, T n , 0) and 
G(T n+ i, T n , 0). These quantities will be useful for the affine computation of 
F(T n+1 ,T n ,P k U k+1 ) and G(T n+ i,T n , {I - P k )U k+1 ) at step k + 1. 

2: Step 0: This step is identical to Step 0 for the homogeneous case. 

3: Step k + 1: 

• Enhance the space S k = Span{S k ~ * 1 U {U k , n = 0,..., N}} as in the homogeneous 

case and compute the matrix Q k = Q k ] ,..., Q k L ] in order to have the projec¬ 

tion P k . 

• Compute in parallel F(T n+1 ,T n , Q*\), f or ]i =j 1 ,...,j l . 

• Update the approximate solution using the formula: 

U k +l = F(T n+1 , T n , P k U k+1 ) + G(T n+1 , T n , (I - P k )U k+l ) - G(T n+1 , T n , 0). (20) 


Remark 3.8 We note here that F(T n+ i, T n , P k U k+1 ) can be computed at minimal cost, since 
PkU k+1 is a linear combination of approximations for which the evolution is known. In fact, 
F(T n _|_i, T n , P k U k+l ) is computed as a linear combination of F(T n+ i, T n , 0), computed at Step 
—1, and ofF(T n+ i,T n ,Q k ), computed in parallel: 

F(T n+1 ,T n , P k U k+1 ) = J2 ai{F(T n+1 ,T n , Q k ) - F(T n+1 , T n , 0)} + F(T n+1 , T n , 0), (21) 

i 

where P k U k+l = a r Q k . 
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Application of the modified parareal algorithm to the computation of matrix 
functions 


The purpose of this section is to propose an alternative method, using the modified parareal 
algorithm, to the computation of matrix functions f(A), such as sin(A), cos(A), A _1 , where A is 
a matrix. As already explained previously, the idea is to see f(A) as the solution at time T = 1 
of a differential equation and to apply the modified parareal algorithm in order to compute the 
solution of the differential equation. 

In order to obtain the differential equation, we can pose A(t) = Xo + t(A — Xq), with Xq 
conveniently chosen, and we define X(t) = f(A(t)). Thus, A'(l) = f(A), so we need to compute 
X(t) at t = 1. Under optimal conditions, the differential equation satisfied by X is ©■ 


Example 3: The computation of sine and cosine of a matrix A 

Let A be a given matrix, we write A(t) = Xo + t(A — Xo) and we define X(t) = sin(A(t)), 
Y(t ) = cos (A(t)). Then, X , Y satisfy the following differential equation: 


(') = 1.4 - AVil-ff), 
A = -(A - X 0 )X(t), 


which can be written in the compact form: 




o 


— {A — Xo) 


A-X o 
0 


( 22 ) 


(23) 


Xq can be chosen for convenience 


with U(t) = (X(t),Y(t)) and B = 
either 0 or I. 

We can now apply the modified parareal algorithm to the linear homogeneous problem (23) 
with the initial condition ?7(0) = (A(0),y(0)). In order to improve the efficiency of the method 
when we compute cos(A), we combine the algorithm proposed by Hargreaves and Higham in 
m and the modified parareal algorithm. Our algorithm reads as follows: 


Algorithm 5 Modified parareal algorithm applied to the computation 
of the cosine of a matrix A _ 

1: A given in Al n (M) 

2: Find the smallest non-negative integer m such that 2 - m PI|oo < 1. 
3: Define the matrix Aq = 2 ~ m A. 

4: Compute Cq = cos(Ao) using the modified parareal algorithm de¬ 
scribed previously. 

5: Recover cos( A) using m times the formula cos(2 M) = 2 cos 2 (M) — I : 

• for i = 0,..., m — 1 do 

. C i+ 1 = 2 Cf-I 

• end for 

• x = c m 


13 







We illustrate numerically these results. In Figure [3j we compare the speed of convergence 
between the classical parareal algorithm and the modified one, both applied to the computation 
of the cosine of a matrix. For the numerical simulations, we first divided the time interval [0,1] 
into N = 10 intervals [T n , T n+ \\. obtaining thus a coarse grid. Each interval was also divided 
into J = 100 subintervals, obtaining a fine grid. For the coarse and fine propagators, we used 
an Euler explicit method. 

We plotted the error measured in the L°°(0, T)-norm between the fine solution, meaning 
between an approximation of cos(A) computed as the solution of the ordinary differential equa¬ 
tion (23) when a sequential method using the fine propagator is used, and the approximation 
produced by the two algorithms at each iteration. We can notice that the modified parareal 
algorithm converges much faster. 




Figure 3: Error for the classical and the modified parareal algorithm, applied to the computation 
of the cosine of a matrix 


4 Acceleration of convergence in time 

4.1 Description of the method 

In a number of cases, the matrix function evaluated at A, f(A), can be defined implicitely as 
the zero of a proper functional J r , i.e., X(X, A) = 0 X = f(A)- it can correspond to a 

(stable) steady state of an ODE: 


(II (24 ) 

X{0) = X 0 . 

So f(A ) can be approached numerically by applying an explicit time marching scheme on a 
sufficiently large time interval [0, T\. This is the case, e.g., with J-(X, A) = I — AX when 
considering the computation of f(A) = A -1 , see [3j. 
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Of course, the efficiency of the numerical method is related to the speed of convergence in 
time to the steady state. A first idea is to speed it up by adding a time dependent term V(t) as 


— =F(Y,A) + V(t),t> 0, 

no) = *o, 


in such a way 


II F(Y(t),A) || = 
t^+oo || F(X(t),A) II ’ 


(26) 


this is in fact a kind of acceleration of convergence in time. An important feature is that 
superlinearly convergent algorithms can be then obtained by applying appropriate time marching 
schemes, as shown ahead in this section with the numerical illustrations. 


Remark 4.1 When choosing F(Y, A) = f — AY, the discretization of the above systems by a 
variable time step-size Backward Euler method gives rise to the iterations 

x i k +i) = x (k) + ak ^ _ Ax (k)\ ) (27) 


and respectively 

y(fc+i) = y(fc) + ak (f_ a y W + V (fc) ) . 


(28) 


= 0 we need to take = 


|| F(Y^ k \A) 

So, in order to have a solution such that lim - tt-t- 

fc^+oo || F{X^ k \A) II 

k k 

- ' ] . —, with r k = f — AY^ and V^ the sequence of the conjugate direction of the 

< r , Ar > 

conjugate gradient method (CGM), thanks to its convergence in a finite number of steps. More 
generally, the solution of linear systems can be modeled by dynamical systems and numerical 
algorithms can be derived by numerical time integration, see W- 


We go back to the continuous acceleration of convergence condition (26). Let us consider for 
the maximum of simplicity the scalar differential equation 


dx 

— = b — ax, 
dt 

x(0) = x 0 , 


(29) 


where a > 0. For any initial datum xq, the solution x(t) converges to x* = ^ and we have 

x(t) = e~ at xo + b- —4-. Now we want to tune the equation in such a way the convergence to 

x* is faster. This means to determine u(t) such that the solution y of the ODE 


— = b-ay + u(t), 

2/(0) = x 0 , 


( 30 ) 
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ay(t ) — b \ 

satisfies lim --—- r = 0. A trivial computation gives 

t- 1 +oo ax(t) - b\ 


ay(t) - b 


e as u(s)ds 


— 1 "f CL- 


ax{t) — b ' ax o — b 

Hence setting u(s) = e as u(s), the last condition reads 

r f f i b-ax o 

nm / v(s)as =-. 

t —»+oo Jq CL 

Of course the choice of v(t) is not unique. In particular the speed of convergence of y to x* 
depends on the speed of convergence of the integral to the limit -— ® x ° ; compactly supported 
functions v, which allow to reach the limit in finite time, can be good candidates. 


This little computation can be applied in higher dimension considering a diagonalizable 
matrix A as 


and 


X(0) = X 0 , 


d l = b-AY + u(t), 


(31) 


(32) 

Y( 0) = Xq. 

If D denotes the diagonal matrix containing the eigenvalues of A and P the associated passage 
matrix, we obtain as a condition of acceleration of convergence 


lim 

t —^ "^OO 


e XiS Ui(s)ds = 


'o 


bi \jX\ 




A ? ; 


which reads as (26) choosing an appropriate norm. We have used here the notation X = P l X 
and D = P~ 1 AP, D = diag{ Ai, • • • , A n ), so we recover the condition lim / v(s)ds = X* — Xq, 


t —^-)-oo 


with v(s) = e sA u(s ). 


'o 


The above computation can be extended in the nonlinear case as following: 

X(0) = Xq. 


(33) 


We assume that T is differentiable at its root X* and that DP(X*) is diagonalisable and that 
its eigenvalues of are of strictly negative real part. We assume in addition that Xq is chosen in 
a suitable neigborhood of X* and consider the two linear problems 

( 34 ) 


6X(0) = Xq - X*, 
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and 


dSY 

dt 


= DJ r (X*)SY + U(t), 


<w(o) = x 0 -x*. 

We deduce from above a way to compute U(t) as 


(35) 


lim f e sDX{x * ) U(s)ds = X* - X 0 . 
■^+°° Jo 


t —^-(-OO 

The function Y(t) = X° + 5Y(t) is expected to converge to X* faster than X(t), at least locally. 
Of course X* is unknown but it can be estimated during the numerical computation. 

Remark 4.2 We can write the acceleration convergence condition as 

[ t e sM U(s)ds = X* -X 0 + £(t), 

Jo 

with lim £(t) = 0; here M is a given matrix, it can be A as well as DF(X*), depending on the 

t —>+oo 

context. We must impose £(0) = —{X* — Xo) and write £(t) = / rj(s)ds — ( X * — Xo). Finally, 

Jo 

f e sM U (s) — rj(s)ds = 0 Vt, 

Jo 


we have 


so 


U(s) = e sM r](s). 

This very simple computation shows how to rely the speed of convergence given by f(t) then by 
rj(t) to the accelerator U (s). 

4.2 Practical implementation and illustration 
Example 4: The linear vectorial case 

We first consider the linear problem in M n and to illustrate the convergence acceleration, we 
choose as control function u(t) = x\o il {t)e~ tA (X*—X o) in such a way lim / e sA u(s)ds = X* — Xo. 

t-s>+oo J q 

For this choice of u we can compute the speed of convergence f(t) of Y(t) to X*. Indeed, we 
have r)(t) = X[ 0i i](t)(X* — Xo). Hence £(i) = min{(t — 1),0)(X* — Xo). 

Of course, the definition of u needs the solution X* to be known and we will rather use an 
approximation of this function by replacing X* with an approximation X obtained, e.g. by 
solving a system MX = 6, where M is a preconditioner of A. 


Algorithm 6 Simple Gradient Acceleration 

1: X(°) = Xo given in M n 

2: X ~ X* is approached by solving MX = b 

3: e = X — X(°) and u = e 

4: for k = 0, 1,.' • • until convergence do 

5: Set X( fc+1 ) = X( fc ) + A t(b - AX^ + u 

6: Set u( fc+1 ) = (Id - AtA) U W 

7: end for 
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Numerically, the limit is reached for a small time t*, that can be determinated. It can be a 
starting point for implementing the PM method: a coarse solution is first computed to determine 
t* and X. Then the PM process can be displayed combining fine and coarse propagators. 
Similarly, we can consider the acceleration of the steepest descend method by taking a sequence 
of variable time steps as follows: 


Algorithm 7 Steepest Descent Acceleration 
1: = Xq given in M n 

2: X ~ X* is approached by solving MX = b 
3: e = X — and = e 
4: r® = b-AX(°l 

5: for k = 0, 1, • • • until convergence do 

<* AD AD \ 

<,: A/ = < Ar<- k ),r( k ) > 

7: Set X( fc+1 ) = + At(rW + uW) 

8: Set u( fc+1 ) = (Id - A tA) U W 

9: Set — A t,A(A k ' > + u^) 

10: end for 


Below, on Figures 


and 


we reported the history of the ratio 


AY(t) - b 


versus the time, 


AX(t) - b 

when using respectively tlie simple gradient method and the steepest descent method. In both 
cases, the acceleration procedure allows a much faster convergence to the limit. It has to be 
pointed out that the underlying idea is based only on dynamical system arguments and not on 
linear algebra ones. 




Figure 4: Acceleration of the convergence in time for the Heat equation n = 127, A t = 
Xq = 0, b = 1, M is built as an incomplete LU factorization of A 
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Figure 5: Acceleration of the convergence in time for the Heat equation with steepest descent 
method n = 127, Xq = 0, b = 1, M is built as an incomplete LJJ factorization of A 


Example 5: Computing the inverse of a matrix 

We now consider the computation of the inverse of a matrix A. We first describe the formal 
procedure in the general case, we will consider sparse approximations in a second time. The 
ODE to be considered here is 


dX 

dt 


Id - AX, 


X(0) = X 0 , 


(36) 


for A, e.g. SPD, we have X(t) —> A 1 as t —> +oo. 


Algorithm 8 Acceleration for inverse matrix approximation 
1: X^ = Xq given in _A/f„(M) 

2: X ~ X* is approached by solving MX = Id 
3: e = X — and = e 
4: for k = 0,1, • • • until convergence do 
5: Set A( fe+1 ) = X^ + A t(Id - AX^ + „(*)) 

6: Set = (Id - AtA) U W 

7: end for 

We illustrate here the acceleration obtained by using the accelerator U in two situations. In 
Figure [6j we start from an approximation of the initial error e ~ A~ 1 — Xq, obtained by a 
threesholding of the coefficients of A at a level of 1 percent; A being the Laplacian finite 
differences matrix on a square, rescaled by its Frobenius norm (A = tt -^-n -—). We observe 

clearly the acceleration property of the method. In Figure [7| we started from the exact initial 
error e = A -1 — Xq and we observed a much important acceleration effect, this traduces a 
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sensibility. Here n is the number of discretization points in each direction of the domain, the 
matrix A is then of size n 2 x n 2 . 




Figure 6: Acceleration of the convergence in time for inverse of a matrix n = 63, Ao = 0,6 = 
1,A t = 0.1 and e ~ A -1 — Xg 




Figure 7: Acceleration of the convergence in time for inverse of a matrix n = 63, Xq = 0, 6 = 1, 
At = 0.1, and e = A -1 — Xq 
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Application of optimal control techniques for capturing steady 
states 


5.1 Derivation of the method 


A particular acceleration procedure applied to (25) consists in choosing V(t) such that f(A ) is 
reached (in a sense to precise) at a given finite time T; V(t) appears as a control, defining Xy(t) 
as the solution of 


d ^ = X(X v ,A) + V(t), t € (0,1), 

-XV (o) = x Q . 


Hence, Xy( 1) = f(A) or T{Xy(l), A) = 0. This last condition can be relaxed in terms of 
optimal control, choosing the merit function 

J(X V , V) = a II F(X v (l),A) III- +7 [ || V(t) f F dt. 

Jo 

The problem can be rewritten as 


inf J(X v ,V). 

v / < ^X=H x vA)+v{t) 

In [22], Maday and Turinici proposed a parareal method to solve control problem for parabolic 
equations; in practice, they used the PM as a preconditioner of the propagation matrix on the 
fine grid in time; in a recent article [7], Du et al applied the PM procedure as a preconditioning 
of the Schur complement derived from KKT’s optimality relations, for the same type of problem. 


Let us follow [22]: we write for simplicity V(t) = Bu(t ) where B a matrix given in It 

is proposed to rewrite the problem in terms of virtual control by adding a penalty term to the 
merit function 


a 


Je{u, A) — — || X{un-i{T), A) || F +- 


1 /* i i N 1 

1/0 k =0 


) - Afc 


where A = (Ao, • • • , Aat_i) and e > 0 is small and yk is the solution of 

A ) + Bu(t), t, € {T k ,T k+ 1 ), 
UkiJJ-'k ) = A k , 


(38) 


with T k = kAt. The resolution of the minimization inf u J e (u,A) problem can be done by 
applying, e.g., a gradient method to J e (u, A) as follows 


< +1 =< 
A-+ 1 =A l 


p(u% + B*p™), 

pIm-Hm- 1 )*]^^)-^!^-))), 
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where p > 0 is the descent parameter (a conjugate gradient method could be also considered) 
and where p k are the solutions of the adjoint equations 


- d ^ r 1 = F(p N -i,A), t e (T k ,T k+1 ), 

Pk(T k+1 ) = aJ : {y N - 1 ,A), 

and for k = N — 2, N — 3, • • • ,0 

- d ^=r( p k ,A), te(T k ,T k+1 ), 

Pk(T k+ 1) = ■(y k (T k+1 ) - Afc+i); 

M and M are respectively the fine and the coarse propagators of the PM method: 


/ Id 



\ o 


0 

Id 0 
-F Id 

—F 


0 ^ 

and M 
Id 


I Id 0 • • • 0 \ 

-G Id 0 

0 -G Id ••• 

\ o • • • -G Id ) 


(39) 


(40) 


Here F and G represent the fine and the coarse numerical propagators, on the time intervals 
[T k , T k+ 1 ], k = 0, • • • N — 1, which we have not specified since there is no ambiguity. 


5.2 Sparse case 

With the same framework, it is also possible to compute sparse approximations. To this end, 
we add an additional condition on the matrix coefficients. Let H = {(i,j) £ I x J}, where I 
and J are subset on indicies of {1,..., n}. The problem reads as 

inf J(X v ,V). 

y/^|L=J r(x v ,A)+V(t), x v gh 
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